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Abstract — MIMO system provides spatial multiplexing 
gain and diversity gain. Maximum likelihood solution is 
optimal decoder for MIMO systems. However, computational 
complexity of such lattice decoder is high in terms of hardware 
implementation. Moreover, throughput is also variable. In this 
paper, Sphere Decoding algorithm is used which provides an 
ease in terms of computational complexity and also throughput 
is constant. For real time applications, constant throughput is 
efficient. Hence, Sphere Decoder provides more efficient and 
realizable hardware. Such algorithm increase speed on chip 
and can be extended such that operation is performed using 
less no. of cycles. For a 4-transmit and 4-receive antennas 
system using QAM, a higher decoding throughput in terms of 
Mbps and low BER for MIMO system can be achieved. Also 
BER performance decoding throughput in terms of Mbps of 
Sphere Decoder is close to Maximum Likelihood solution. 

Index Terms — lattice decoding, Sphere decoding, Maximum 
Likelihood, k-best SD, breadth first search. 


I. Introduction 

This section gives introduction of MIMO Detection and its 
various literatures. It introduces MIMO system and tradeoff 
between Spatial multiplexing gain and Diversity gain. 
Section II gives general classification of MIMO Decoders 
and various techniques employing them. It also highlights 
pros and cons of mentioned techniques. Section III gives 
detailed idea of Sphere decoding and tree traversal and 
radius reduction based on constraints of SD. It describes 
difference between depth-first and breadth-first tree search. 
Section IV shows lattice decoders and algorithm 

implemented for k-best with and without sorting techniques. 
Section V shows hardware architecture and various 
Simulation results. 

A. MIMO Decoders 

Recently, need of higher transmission rates with less 
transmission errors have increased in wireless 

communication. Hence use of multiple antennas for 
communication is need of an hour. Hence, next-generation 
wireless networks have emerged to offer higher transmission 
rates with less error. Multiple antenna systems increase 

spectral efficiency of the system through the use of diversity 
techniques and SM (Spatial Multiplexing) scheme. 
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In [1] a basic transmit diversity scheme is developed for 
two transmit antennas, while in [2] the diversity gain is 
increased using four transmit antennas with orthogonal 
codes. In past researches, the channel is assumed to be 
uncorrelated and at the receiver maximum-likelihood 
detection is employed together with combining techniques. 
On the other hand, the spectral efficiency of the system is 
increased by employing spatial multiplexing (SM) [3] which 
permits the opening of multiple spatial data pipes between 
transmitter and receiver without any additional bandwidth or 
power requirement. MIMO systems introduce a spatial 
dimension to existing rate adaptation algorithms that implies 
to decide MIMO transmission type, STBC, spatial 
multiplexing or hybrid approaches, as well as modulation 
and coding type. However, in MIMO systems, correlations 
may occur between channel coefficients due to insufficient 
antenna spacing and the scattering properties of the 
transmission environment. This may lead to significant 
degradation in system performance. In this regard, adding 
more antennas to the base- station and/or the subscriber unit 
require more spatial dimension at the base station and/or the 
subscriber unit in order to have an uncorrelated channel 
between antenna elements. Hence, it would not be feasible 
to design higher order MIMO systems in small handsets. On 
the other hand the use of dual -polarized antenna elements is 
introduced as a space and cost-effective alternative that is 
used to transmit information symbols through vertical and 
horizontal polarizations without any additional power and 
bandwidth requirement. In communication with dual- 
polarized antenna elements, the information streams are sent 
through vertical and horizontal polarizations of the antenna 
elements at the same time and frequency. 

However as pointed out in [4], imperfections of 
transmit and/or receive antennas and XPD factor, which is 
the power ratio of the co-polar and cross-polar components, 
degrades the system performance considerably. In [5], a 
system employing one dual-polarized antenna at the 
transmitter and one dual -polarized antenna at the receiver is 
presented and the error performance of 2 -antenna SM and 
STBC transmission schemes are derived for this virtual 
MIMO system. Notice that, in [5], a single-input single- 
output (SISO) system is enabled with MIMO capabilities 
through the use of dual -polarized antennas. In IEEE 802.1 In 
and WiMAX systems, 2x1 and 2x2 antenna configurations 
are used with Alamouti and SM transmission techniques, 
however, although it is defined in the standards, higher 
order MIMO systems are not used due to the space problem. 
The system throughput is controlled by both these two 
MIMO options and modulation and coding schemes (MCS) 
defined in the standards. However, when dual -polarized 
antenna elements are used at both link ends, a virtual 4x4 
MIMO system is obtained where the hybrid MIMO options 
can also be carried out to maximize detection performance 
and multiplexing gain of the system. 
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In combination of virtual 4x4 MIMO system with 
MCS’s including large constellation size; a tremendous 
computational effort is required when the optimal detection 
techniques are employed at the receiver, especially for SM 
and hybrid transmission techniques. On the other hand, we 
need a low complexity but an effective receiver. 

Recently, a lot of research efforts have done in order to 
suppress undesired transmission effects through low 
complexity iterative equalization methods based on the joint 
use of linear MMSE filtering and SIC process. For instance, 
in [6] multiple access interference are cancelled out for 
CDMA systems while in [7] ISI effects are suppressed for 
single antenna systems. However, when the channel 
memory length is large, employing equalization at time 
domain would require a considerable computational effort 
due to the matrix inversion. Therefore, frequency domain 
equalization is introduced in the literature and through [8] [9] 
low complexity iterative frequency domain equalization is 
studied. Moreover, equalization task is performed at 
frequency domain in OFDM systems where frequency- 
selective fading channels become frequency-flat fading 
channels by sending the symbols through orthogonal 
subcarriers. By adding at least channel length cyclic prefix 
to the system, OFDM technology solves the ISI problem. 
Due to that reason, OFDM is used as a standard technique in 
IEEE 802.1 In WiMAX systems. When combining four 
dual-polarized MIMO transmission techniques with MCS’s 
introduced in IEEE 802.1 In and WiMAX standards, a 
transmission channel can be fully utilized via a proper 
adaptive switching mechanism. In this regard, we employ 
the standard link adaptation technique, given in [10] where 
SNR information is defined as a link quality indicator and 
the transmission parameters are adapted to the current 
channel conditions according to the SNR knowledge. 

Diversity gain and spatial multiplexing gain are 
related to system coverage range and data rate, 
respectively. Both gains can be improved using a larger 
antenna array. However, given a MIMO system, there is a 
fundamental trade-off between these two gains. In the 
diversity-multiplexing space, repetition code, Alamouti 
code, and space-time code use data redundancy to increase 
diversity at the price of losing spatial multiplexing gain. In 
contrast, Bell Labs Layered Space Time (BLAST) 
algorithm, Singular Value Decomposition (SVD), and QR 
decomposition allocate data-streams in different Eigen- 
modes to maximize spatial multiplexing gain while 
sacrificing diversity gain, as shown in Fig. 1. 

Sphere decoding is a decoding scheme that can 
extract both diversity and multiplexing gains. With 
flexibility in coding and modulation, sphere decoder can 
effectively explore the entire tradeoff curve as shown in 
Fig.l. 

B. Basic Model for MIMO System 

Our equivalent complex-valued discrete-time baseband 
system model is as follows. 

y = Hs + n (1) 



Spatial multiplexing gain (rate) 

Fig.l. Diversity v/s Multiplexing tradeoff in MIMO 
communications [11]. 



Fig. 2. Basic Model of MIMO System [11] 

where H denotes the M r x M t channel matrix, s= [si s 2 s M J 

is the M t -dimensional transmit signal vector, and n stands 
for the M r -dimensional additive i.i.d. circularly symmetric 
complex Gaussian noise vector. The entries of s are chosen 
independently from a complex constellation 6 with Q bits 
per symbol, i.e. 0=2 Q . The set of all possible transmitted 
vector symbols is denoted by 0 Mt . The corresponding 
uncoded transmission rate is R= M t Q bits per channel use 
(bpcu). 


II. Classification of MIMO Decoders 



Fig. 3. Classification of MIMO Decoder 
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We assume that the receiver has acquired knowledge of 
the channel H (e.g., through a preceding training phase). 
Algorithms to separate the parallel data streams 
corresponding to the transmit antennas can be divided into 
four categories. 

1. Linear detection methods invert the channel matrix 
using a zero -forcing (ZF) or minimum mean squared 
error (MMSE) criterion. The received vectors are then 
multiplied by the channel inverse, possibly followed by 
slicing. The drawback is, in general, a rather poor bit- 
error-rate (BER) performance. 

2. Ordered successive interference cancellation decoders 
such as the vertical Bell Laboratories layered space - 
time (V-BLAST) algorithm show slightly better 
performance, but suffer from error propagation and are 
still suboptimal. 

3. Maximum-likelihood (ML) detection, which solves 

s = arg min II y-Hs II 2 se 0 Mt (2) 

is the optimum detection method and minimizes the 
BER. A straightforward approach to solve (2) is an 
exhaustive search. Unfortunately, the corresponding 
computational complexity grows exponentially with the 
transmission rate R, since the detector needs to examine 
2 r hypotheses for each received vector. While the 
implementation of exhaustive- search ML has been 
shown to be feasible in the low rate regime R<= 8 bpcu, 
complexity quickly becomes unmanageable as the rate 
increases. For example, in a 4 x 4 system (i.e. 
M r =M t =4) with 16-QAM modulation (corresponding to 
R= 16 bpcu), 65,536 candidate vector symbols have to 
be considered for each received vector. 

4. Sphere decoding (SD) solves the ML detection problem 
[12] [13]. While the algorithm has a nondeterministic 
instantaneous throughput, its average complexity was 
shown to be polynomial in the rate [14] for moderate 
rates, but still exponential in the limit of high rates. 
However, these asymptotic results do not properly 
reflect the true implementation complexity of the 
algorithm, which for most practical cases is still 
significantly lower than an exhaustive search. The 
algorithm is thus widely considered the most promising 
approach toward the realization of ML detection in 
high-rate MIMO systems. Ever since its introduction in 
and its application to wireless communications in [13], 
reduction of the computational complexity of the 
algorithm has received significant attention [13] [15]. 
However, most modifications of the algorithm proposed 
in the literature so far have been suggested with digital 
signal processor (DSP) implementations in mind. Little 
attention has been paid to the efficient VLSI 
implementation of the SD algorithm and the associated 
performance tradeoffs. 

III. Basics of Sphere Decoding 

In this section, we briefly review the basics of SD, and 
we outline what we consider to be the corresponding state of 
the art. Our description summarizes the original algorithm 


[16], introduced by Pohst, and its subsequent extensions and 
improvements [13] [17] [15]. We distinguish four key 
concepts, which we describe in the following. 

A. Sphere Constraint 

The main idea in SD is to reduce the number of 
candidate vector symbols to be considered in the search that 
solves (2), without accidentally excluding the ML solution. 
This goal is achieved by constraining the search to only 
those points Hs that lie inside a hyper sphere with radius r 
around the received point y. The corresponding inequality is 
referred to as the sphere constraint (SC) 

d(s) < r 2 where d(s) = Ily-Hsll 2 (3) 

B. Tree Pruning Strategies 

Only imposing the SC (3) does not lead to complexity 
reductions as the challenge has merely been shifted from 
finding the closest point to identifying points that lie inside 
the sphere. Hence, complexity is only reduced if the SC can 
be checked other than again exhaustively searching through 
all possible transmit vector symbols s<e 0 Mt . Two key 
elements allow for such a computationally efficient solution 

1 . Computing Metric 

The channel matrix H in (3) can be triangularized using a 
QR decomposition according to H=QR, where the M r x M t 
matrix 

d(s) = c + II y -Rs II 2 where y = Q 11 y = Rs ZF (4) 

where s ZF is the zero-forcing (or unconstrained ML) solution 
s ZF =H* y (H* is pseudo-inverse of H). The constant c is 
independent of the vector symbol and can hence be ignored 
in the metric computation. In the following, for simplicity of 
exposition, we set c=0. If we build a tree such that the leaves 
at the bottom correspond to all possible vector symbols s 
and the possible values of the entry s M t define its top level i 
(i=l, 2....M t ), we can uniquely describe each node at level 
by the partial vector symbols s (l) =[Si s i+ i . . ..s M th 

Now, we can recursively compute the (squared) distance 
d(s) by traversing down the tree and effectively evaluating 
in (3) in a row-by-row fashion. We start at level i=M t and 
set T Mt+ i(s Mt+1 ) = 0 .The partial (squared) Euclidean 
distances (PEDs) T* (s (l) ) are then given by 

Ti (s (i) ) = T i+1 (s (i+1) ) + I ei (s (i) ) I 2 (5) 

With i=M t , M t -1 . . . . 1 , where the distance increments of 
I ei(s (l) )l 2 can be obtained as 

I e; (s (i) ) I 2 = | yi - £ RijSj I 2 (6) 

We can make the influence of S[ more explicit by writing 

| ei (s (i) ) I 2 = I b i+1 (s <i+I) ) - RiiSjl 2 (7) 

b i+ i(s (i+1) ) = yi-S R ijSj (8) 

where i+l<=j<= M t . Finally d(s) is the PED of the 
corresponding leaf: d(s) =Ti(s). Since the distance 
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increments lei (s (l) )l 2 are nonnegative, it follows immediately 
that whenever the PED of a node violates the (partial) SC 
given by 


Ti(s (i) )<r 2 (9) 

then the PEDs of all of its children will also violate the SC. 
Consequently, the tree can be pruned above this node. This 
approach effectively reduces the number of transmit vector 
symbols (i.e. leaves of the tree) to be checked. 

2. Tree Traversal and Radius Reduction 

When the tree traversal is finished, the leaf with the 
lowest Ti(s) corresponds to the ML solution. The traversal 
can be performed breadth-first or depth-first. In both cases, 
the number of nodes reached and hence the decoding 
complexity depends critically on the choice of the radius r. 
The k-best algorithm [18] [19] approximates a breadth-first 
search by keeping only up to k nodes with the smallest 
PEDs at each level. The advantage of the k-best algorithm 
over a full (depth-first or breadth-first) search is its uniform 
data path and a throughput that is independent of the 
channel realization and the SNR. However, the k-best 
algorithm does not necessarily yield the ML solution. 

In a depth-first implementation, the complexity and 
dependence of the throughput on the initial radius can be 
reduced by shrinking the radius r whenever a leaf is reached. 
This procedure does not compromise the optimality of the 
algorithm, yet it decreases the number of visited nodes 
compared to a constant radius procedure. As an added 
advantage of the depth-first approach with radius reduction, 
the initial radius may be set to infinity, alleviating the 
problem of initial radius choice. However, in contrast to the 
k-best algorithm, a depth-first traversal does not yield a 
deterministic throughput. Hence breadth first algorithm is 
used for real-time application. 

C. Accountable sets 

The admissible set of children s (l) of a particular parent 
s (l+1) in the tree is simply defined by the constellation points 
Si for which the PED satisfies Ti (s (l) ) < r 2 .In the case of real- 
valued constellations, one can determine the boundaries of 
an admissible interval using (5) in conjunction with (4) and 
the partial SC (8). All admissible children are then contained 
within these boundaries. Unfortunately, in the practically 
more relevant case of complex-valued constellations, 
admissible intervals cannot be specified. A solution for 
QAM constellations that is frequently found in the literature 
is to decompose the M t -dimensional complex signal model 
into a 2M t -dimensional real-valued problem according to 
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This approach results in a tree that is twice as deep as the 
original tree (corresponding to the complex-valued 
formulation) with a smaller number of children per node. 
The number of leaves remains unchanged. However, we will 
argue later that performing SD directly on the complex 
constellation is more efficient in VLSI implementations. 


possible in order to shrink the sphere as fast as possible and 
hence expedite the tree pruning. A scheme proposed by 
Schnorr and Euchner [17] and modified for the finite lattice 
case in [15] traverses the members of the admissible sets in 
ascending order of their PEDs. In the case of real-valued 
lattice constellations, given a starting point and an initial 
direction, this ordering is predefined. The decoder starts 
with the center of the admissible interval and proceeds to the 
boundaries in a zigzag fashion. As shown in [15], there is no 
need to explicitly compute the boundaries; instead, due to 
the Schnorr-Euchner (SE) ordering, it is sufficient to 
terminate once the SC is violated. In the case of complex- 
valued constellations, SE ordering is still possible even 
without the real- valued decomposition (10). However, 
depending on the constellation, no obvious predefined order 
may exist. Hence explicit sorting of the admissible children 
by their PEDs may be required, incurring a high 
implementation complexity. 

IV. Typical Lattice Decoder For MIMO Detection 
A. Lattice Decoder 

A typical lattice decoder for MIMO detection consists of 
a pre-processing unit, a pre-decoding unit and a decoding 
unit, as shown in Fig. 4. The preprocessing unit takes the 
estimated channel matrix H, and generates its inverse H" 1 , a 
triangular matrix L, and a correspondingly optimal ordering 
p if needed. The task of the pre-decoding unit is simply to 
generate a zero forcing (ZF) point z = (H _1 x) T as an initial 
estimate for the decoding unit. The computational 
complexity of the pre-decoding unit is omitted in the 
following complexity analysis for all the lattice decoders. 
The differences among various lattice decoders for MIMO 
detection depend largely on the design of the decoding unit. 



Fig. 4. Typical Lattice Decoder [20] 

In a lattice decoder, an n=2 M-dimensional lattice is 
decomposed into n sublattices. Let k be the dimension of the 
sublattice that is currently being investigated, and y the 
orthogonal distance between two points in the adjacent sub 
lattices. The objective of the decoder is to search for the 
lowest possible squared distance bestdist between (k=n)- 
dimensional and (k=l) -dimensional sublattice [12]. 

In theory, the BER performance of the SE and the SD 
algorithms should be the same for MIMO detection, since 
the difference between the SE and the SD lies in the 
searching order among the sublattices [12]. According to the 
searching direction instead, the lattice decoders can be 
divided into two types, the depth-first type with variable 
throughput and the breadth-first type with fixed throughput. 


D. Optimum Ordering 

With radius reduction, it is desirable to find candidate 
solutions that lie close to the ML solution as early as 


B. Breadth first algorithm 

Instead of the metric -first and depth-first mixed 
searching scheme, the breadth-first searching scheme can 
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also be employed for MIMO detection. The breadth-first 
algorithm searches for the bestdist in the forward direction 
only, but the best K candidate newdist are kept at each level 
of the sublattice. Hence, the breadth-first algorithms can 
result in a constant decoding throughput. A strict breadth- 
first algorithm should keep K as large as possible without 
compromising on the optimality, compared with the 
exhaustive -search ML algorithm. However, limiting K can 
reduce the complexity of the breadth-first algorithm [18] 
[21] [22] that is called K-best algorithm. The bit-error rate 
(BER) performance of the K-best algorithm is expected to 
be close to that of the ML algorithm if K is sufficiently 
large, as in the well-known M-algorithm for sequential 
decoding. 

The principle of the K-best type of algorithm is outlined as 
below. 

1. At the root sublattice, initialize one path with metric 
zero. 

2. Extend each survivor path, retained from the previous 
sublattice, to M c contender paths, and update the 
accumulated metric for each path. 

3. Sort the contender paths according to their accumulated 
metrics, and select the K-best paths. 

4. Update the path history for each retained path, and 
discard the other paths. 

5. If the iteration arrives at the end sublattice, stop the 
algorithm. Otherwise, go to Step 2. 

The best path at the last iteration is, thus, the hard decision 
output of the decoder. The advantage of the K-best 
algorithm over the sequential algorithm is its fixed 
throughput, since it is easily implemented in a parallel and a 
pipelined fashion. 

C. Modified Breadth first algorithm 

This algorithm provides more efficient way to implement 
considering set of all points in accountable set. The 
algorithm (for e.g. is for r=R radius) consists of following 
steps 

1. Taking center of sphere, find distance of all points in 
accountable sets. The point with largest distance is 
considered apex point (say A) for tree search. 

2. Now taking radius r=Rl (R1<R), draw sphere from 
center and then consider all points lying within r=Rl 
(say hll). Find distance of all this points from A. The 
point with minimum distance is selected and other 
points are pruned. 

3. Continue the process with r=R2 (R2<R1<R) and find 
minimum metric point. 

4. Continue the process until most likelihood solution is 
obtained. 

V. Hardware Implementation of MIMO Decoders 

Fig. 5 shows the uncoded BER performance of the k- 
Best algorithm for K=5 compared to the ML algorithm. 
Three observations can be made: First, the figure shows a 
significant BER performance loss in the high SNR regime. 
However, in the range of interest (16-20 dB) the loss is 
small. Furthermore, Simulations of coded BER showed 
good results with K=5; also, it is stated that K=5 is 
reasonable. Second, fig. 5 clearly shows a BER performance 
advantage of the RVD algorithm compared to operating 
directly on the complex valued constellation points. Third, 


the simplified norm k-Best algorithm {l 1 - norm) leads to 
almost the same BER performance as the squared / 2 3 4 -norm k- 
Best algorithm. 

A. K- Best Architecture 

The K-Best detector is pipelined such that one layer of the 
tree is always processed in one pipeline stage (Fig. 6). Each 
stage consists of a metric computation unit (MCU), a K-Best 
unit (KBU) that determines the K smallest PEDs, and a 
Register bank L k where the K smallest nodes of the previous 
layer are stored. Together, they form a computation unit. 
Resource sharing is applied such that the K nodes at the 
input of the stage are processed one after the other. In each 
cycle the MCU delivers the PEDs of all children of a parent 
node in L k .These PEDs need to be sorted into a list L k 
where the K smallest PEDs found so far are stored. After K 
iterations, all children of the nodes in L k have been 
computed by the MCU. The KBU has determined the K 
smallest PEDs and delivers them to the next pipeline stage. 
In total, 2 M t almost identical copies of the computation unit 
form the 2M T pipeline stages of the detector. 



Fig. 5. Uncoded BER performance comparison between ML 
algorithm and different alternatives of k-Best algorithm [23] 



Fig. 6. One of 2M T pipeline stages of the K-best VLSI 
architecture [23]. 

In hardware implementation, depth-first is realized in a 
folding-like architecture because only one node is visited at 
a time during the tree search process. In this case, an extra 
memory to record the visited nodes is required, for the trace - 
back operation. K-best is realized in a multi-stage pipelined 
way, because no trace-back is needed. To process K data 
paths at the same time, parallel architecture is applied. 
Fig. 7 illustrates the basic architectures of these two 
search schemes, and Table 1 summarizes their 
comparison in terms of circuit metrics and algorithmic 
performance. 
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For the sphere decoder operating with a large antenna array, 
the biggest challenge in the implementation is reducing area 
of the design. Using the number of (complex) multipliers 
as a first order area estimate, the number of multipliers 
needed in the folding and multi-stage architectures are 
M and M(M+l)/2, respectively , where M is the number of 
transmit antennas. Expanding a 4x4 system to a 16x16 
system, relative area increases from 4 to 16 for the 
folding architecture and 10 to 136 for the multi-stage 
architecture. To keep the area within a reasonable value, 
folding technique is considered. The second design 
challenge is operating frequency for the folded architecture. 



Table 2. Implementation of k- Best Algorithm for 4x4 
systems with 16- QAM Modulation (without pre- 
processing) 


Reference 

Proposed architectures 
[23] 

[18] 

[17] 

[24] 

Technology 

Tumi 

0.25 

0.35 

0.35 

0.25 

Norm 

f 

i 1 

i 2 

i 2 

f 

K 

5 

10 

5 

10 

5 

10 

10 

Core Area Scl 

90 

132 

68 

110 

91 

52 

215 2 

Core Area Sc2 

115 

157 

93 

135 

Throughput 

[Mb/s] 

376 

80 

424 

83 

53 

10 

40 

Loss compared 
to ML [dB] 

0.4 

0.1 

0.75 

0.4 

0.4 

0.1 

0.1 

Latency [cycles] 

49 

89 

49 

89 

240 

1280 

320 

Max.clock[MHz] 

117 

50 

132 

52 

100 

100 

100 


VI. Conclusion 


(a) Depth-first (folding) 


(b) K-best (parallel and multi-stage) 


Fig. 7. Basic architecture of (a) depth-first and (b) K-Best 
algorithm [11]. 

Table 1. Comparison of depth-first and K-best Algorithm 



Area 

Through 

put 

Latency 

Radius 

Shrinking/ 

Tree 

Pruning 

Perform 

ance 

Depth- 

first 

Small 

Variable 

Long 

Yes 

ML 

K-best 

Large 

Constant 

Short 

No 

Near-ML 



(a) area reduction using folding technique 



Fig. 8. Design challenge and tradeoff for large antenna size. 
Impact of antenna array size on (a) area and (b) critical path 
delay [11]. 

The K-Best detector is pipelined such that one layer of 
the tree is always processed in one pipeline stage (Fig. 8). 
Each stage consists of a metric computation unit (MCU), a 
K-Best unit (KBU) that determines the K smallest PEDs, 
and a Register bank L k where the K smallest nodes of the 
previous layer are stored. Together, they form a computation 
unit. Resource sharing is applied such that the K nodes at 
the input of the stage are processed one after the other. In 
each cycle the MCU delivers the PEDs of all children of a 
parent node in L k .These PEDs need to be sorted into a list 
L k where the K smallest PEDs found so far are stored. After 
K iterations, all children of the nodes in L k have been 
computed by the MCU. The KBU has determined the K 
smallest PEDs and delivers them to the next pipeline stage. 
In total, 2 M t almost identical copies of the computation unit 
form the 2M T pipeline stages of the detector. 


This paper includes different techniques of MIMO Detection 
and from survey we can conclude that K-Best Algorithm 
with sort free method gives best result keeping in mind 
hardware implementation and BER is close to that of 
Maximum Likelihood (ML) solution. We can further 
decrease area and power consumption by using selective 
multipliers and better combination of adders and shifters. 
We can also use better tree search algorithms keeping in 
mind sphere used for decreasing computation complexity. 
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